Temporal Difference-based Adaptive policies in Neuro-dynamic Programming
نویسندگان
چکیده
Abstract. Based on temporal difference method in neuro-dynamic programming, an adaptive policy for finite state Markov decision processes with the average reward is constructed under the minorization condition. We estimate the value function by a learning iteration algorithm. And the adaptive policy is specified as an ε-forced modification of the greedy policy for the estimated value and the estimated transition probability matrix. Also, a numerical experiment for “Toymaker’s problem” is given to illustrate the validity of the adaptive policy.
منابع مشابه
Quagent control via Passive and Active Learning
Artificial intelligence algorithms using passive and active learning versions of direct utility estimation, adaptive dynamic programming and temporal difference approaches to simulate an agent. The explored worlds consisted of discrete states (positions) bounded by internally generated “walls” that included one or more terminal states and a pre determined configuration of rewards for each state...
متن کاملDesign and Simulation of Adaptive Neuro Fuzzy Inference Based Controller for Chaotic Lorenz System
Chaos is a nonlinear behavior that shows chaotic and irregular responses to internal and external stimuli in dynamic systems. This behavior usually appears in systems that are highly sensitive to initial condition. In these systems, stabilization is a highly considerable tool for eliminating aberrant behaviors. In this paper, the problem of stabilization and tracking the chaos are investigated....
متن کاملDynamic Modeling of the Electromyographic and Masticatory Force Relation Through Adaptive Neuro-Fuzzy Inference System Principal Dynamic Mode Analysis
Introduction: Researchers have employed surface electromyography (EMG) to study the human masticatory system and the relationship between the activity of masticatory muscles and the mechanical features of mastication. This relationship has several applications in food texture analysis, control of prosthetic limbs, rehabilitation, and teleoperated robots. Materials and Methods: In this paper, w...
متن کاملControl of Multivariable Systems Based on Emotional Temporal Difference Learning Controller
One of the most important issues that we face in controlling delayed systems and non-minimum phase systems is to fulfill objective orientations simultaneously and in the best way possible. In this paper proposing a new method, an objective orientation is presented for controlling multi-objective systems. The principles of this method is based an emotional temporal difference learning, and has a...
متن کاملThe CFD Provides Data for Adaptive Neuro-Fuzzy to Model the Heat Transfer in Flat and Discontinuous Fins
In the present study, Adaptive Neuro–Fuzzy Inference System (ANFIS) approach was applied for predicting the heat transfer and air flow pressure drop on flat and discontinuous fins. The heat transfer and friction characteristics were experimentally investigated in four flat and discontinuous fins with different geometric parameters including; fin length (r), fin interruption (s), fin pitch (p), ...
متن کامل